Long divergent haplotypes introgressed from wild sheep are associated with distinct morphological and adaptive characteristics in domestic sheep

The worldwide sheep population comprises more than 1000 breeds. Together, these exhibit a considerable morphological diversity, which has not been extensively investigated at the molecular level. Here, we analyze whole-genome sequencing individuals of 1,098 domestic sheep from 154 breeds, and 69 wild sheep from seven Ovis species. On average, we detected 6.8%, 1.0% and 0.2% introgressed sequence in domestic sheep originating from Iranian mouflon, urial and argali, respectively, with rare introgressions from other wild species. Interestingly, several introgressed haplotypes contributed to the morphological differentiations across sheep breeds, such as a RXFP2 haplotype from Iranian mouflon conferring the spiral horn trait, a MSRB3 haplotype from argali strongly associated with ear morphology, and a VPS13B haplotype probably originating from urial and mouflon possibly associated with facial traits. Our results reveal that introgression events from wild Ovis species contributed to the high rate of morphological differentiation in sheep breeds, but also to individual variation within breeds. We propose that long divergent haplotypes are a ubiquitous source of phenotypic variation that allows adaptation to a variable environment, and that these remain intact in the receiving population probably due to reduced recombination.


Introduction
The importance of introgression has been recognized as the mounting evidence accumulate. Candidate genes with introgression signatures in human, animals and plants involved broad functional categories [1,2], including defence against pathogen [3][4][5], pigmentation [6][7][8], altitude adaptation [9][10][11], substance metabolism [3][4][5]12] and other uncharacterized functions. Among them, adaptive introgression that facilitates the adaptation to a diversity of environment is particularly important. Recent studies have identified several intriguing examples of adaptive introgression in domestic animals, that probably have an important influence on their domestication and evolution, such as goat, sheep and cattle [4,5,13]. Adaptive introgression generally undergone selection, resulting in high frequency or fixed in particular populations but absent or low frequency in other populations. Such selection against introgressed fragments in functional regions facilitated the phenotypic differences among distinct populations.
The Ovis genus, including seven recognized wild species (snow sheep, O. nivicola; bighorn, O. canadensis; thinhorn, O. dalli; argali, O. ammon; urial, O. vignei; Asiatic mouflon, O. orientalis and European mouflon, O. musimon) and only one domestic species, possess intricate evolutionary history and pronounced gene flow events. The admixture events among these Ovis species have been repeatedly documented previously [5,11,[14][15][16]. However, most reports focused on isolated cases of gene flow between two sympatric Ovis species, e.g. the introgression from European mouflon into European domestic breeds [5,14], from Iranian mouflon into domestic sheep [15], and from argali to Tibetan sheep [11]. Moreover, some of these studies based on the sheep 50K SNP BeadChip considered only a limited number of variants to evaluate the introgression proportions [11,14]. Given these recurrent findings of interspecies introgression, it would be preferable to jointly infer the magnitude of such introgression across the whole genus, as pairwise introgression results can be biased by ignoring the presence of other introgression events in such reticulated evolution scenarios. Nonetheless, these studies have yielded interesting evidence for introgression of functional genes, such as the HBB locus as adaptation to the high-altitude of the Qinghai-Tibetan plateau [11].
Domestic sheep (Ovis aries) descends from Asiatic mouflon [17,18] approximately 11,000 years ago in southeastern Anatolia of Turkey. As many as 1,400 different breeds [19] exhibit a remarkable phenotypic diversity in response to selection pressures from various environments as well as to human selection. How did these distinct phenotypes form in such a short period after domestication, and how were these phenotypic variations affected by introgression remain largely explored. A comprehensive understanding of the genome-wide influence of introgression from wild relatives into domestic sheep on phenotypes or traits is lacking.
To excavate the impact of introgression on diverse phenotypic traits of sheep, here we build a collection of 1,167 whole-genome resequenced sheep (Fig 1 and S1 Table) with 156 samples were newly generated (S2 Table). We phased the genomes into haplotypes for an integrative analysis of the introgression from different wild sheep species. We further collected genotypes and phenotypes from East-Friesian sheep × Hu Sheep F2 hybrids to annotate the potential functional impact of various introgression signals. Our results provide further insight into the reticulated history of sheep evolution and particularly into the role of divergent haplotypes in the phenotypic diversity.

Genetic variant data and phylogenetic relationships of Ovis genus
To investigate the phylogeny and population differentiation of Ovis species, we collected and generated a whole genome SNP dataset from 1,167 individuals comprising 1,098 domestic sheep across the geographic distribution of 154 breeds and 69 samples of their seven wild relatives (S1 Table). After aligning reads to the Oar_v4.0 (GCF_000298735.2) and quality control, a total of 83,386,953 SNPs were detected.
A whole-genome maximum likelihood (ML) phylogenetic tree revealed that European mouflon is intermediate between Asiatic mouflon and domestic sheep (Figs 2A and S1). This  Fig 2B). The PCA of mouflon

PLOS GENETICS
and domestic sheep as well as the ADMIXTURE pattern at k�7 (Figs 2F and S3) shows a differentiation of eastern and western Iranian mouflon according to their geographic origin (Fig 2B). Moreover, the PCA confirms the relatively close relationship of western Iranian mouflons and domestic sheep. Both PCA and ADMIXTURE at k = 8 reveal a correlation of genetic clustering and geographic distances for domestic sheep (Figs 2D, 2F and S3). Samples from China were subdivided into three groups (Fig 2E and 2F)

Introgressions from wild relatives into domestic sheep
To evaluate the admixture proportion and locate the putative introgressive fragments in domestic populations from their wild relatives, we performed local ancestry inference (LAI) method program LOTER for each fully phased sheep genome. The bighorn, thinhorn, argali, urial, Iranian mouflon and European domestic sheep were used as source populations. European domestic sheep, which has not been in contact with the Asian wild sheep populations following their divergence, shared only few alleles with wild species (S4B Fig) and was used as the non-introgressed reference population. The European mouflon was not tested as a source population due to its close relationship with domestic sheep (Fig 2), which would confound the detection of introgression from the other wild sheep species.
In order to distinguish the putative signals of introgression from shared ancestral polymorphisms (incomplete lineage sorting, ILS), we calculated the expected length L of ILS tracts (see Materials and Methods) using variable recombination rates and removed the inferred introgressed segments with a length < L. This could remove some short introgressed regions, but is justified by the expectation that introgressed regions are considerably longer as they had less time to be broken up by recombination. In addition, we were mostly concerned about long introgressed haplotypes in the present study.
Using the filtered results, we calculated the genome-wide proportions of admixture. We detected an average of 10,036 segments (5,600-13,057), in total corresponding to an average of 180 Mb of wild Ovis sequence (range 96-224 Mb, SD = 23Mb) for each haploid domestic sheep genome. The average proportions of domestic sheep genome from Iranian mouflon, urial, argali, bighorn and thinhorn sheep were 6.8% (3.8-8.5%), 1.0% (0.5-1.4%), 0.2% (0.07-0.3%), 0.03% (0.01-0.05%) and 0.01% (0.006-0.022%), respectively (Figs 3A, S6 and S7), values that are similar to those previously reported for sympatric wild-to-domestic introgression. The introgressed proportions varied considerably across wild donor species, in particular between Iranian mouflon and the other wild species (Fig 3A). The domestic sheep from Asia had a significant higher percentage of Iranian mouflon lineage than those from Americas (S8 Fig). More detailed statistics revealed that domestic sheep from Iran shared more variations with Iranian mouflon compared with domestic sheep from Turkey (S8 Fig). Such significant statistical differences across the whole genome were less likely produced by ILS, supporting the introgression from Iranian mouflon to domestic sheep. East Asian domestic sheep has a relatively strong introgresssion from urial and argali (Fig 3A), consistent with their biogeographic history.
We further computed the modified f-statistic value [24] for each 50-kb window with a 20-kb step across the genomes in the form f d (European domestic sheep, domestic population; wild source of introgression, goat) (Fig 3C-3E). We grouped the domestic samples into 16 focal populations (see Materials and Methods). For each population, the regions with significant f d values (P < 0.001) were defined as potentially introgressed regions [11,25]. We further estimated d XY , phylogenetic trees and haplotype networks to corroborate the signals of introgression in specific regions.

Selection and adaptive signatures for introgressed segments
We focused on those introgressed haplotype blocks that are conserved within but not across populations, since they are most likely involved in population differentiation and adaptation to local habitats or selection [2]. For this, we calculated allele frequencies of the introgressed fragments in 16 domestic meta-populations (see Materials and Methods). Next, we defined 483 mouflon, 5 urial and no argali outlier haplotypes, putatively introgressed on the basis of their length (�100 kb), their total frequency (� 0.05) and frequency variation in the 16 metapopulations (> 0.1 standard deviation) (Fig 3B).
In order to detect fragments that are significantly differentiated between domestic and wild populations, we plotted F ST for each of the 16 meta-populations to the Iranian mouflon across the genome in 50-kb windows with a 20-kb step size (S5 Fig). With a P <0.001 (Z test) F ST cutoff and joining the windows that were separated by a distance of �50 kb, we obtained 2,305 non-overlapping regions. These blocks were slightly but significantly longer than the general blocks (S9 Fig). For the 488 mouflon and urial introgression outliers above, 116 and 3 overlapped with these highly differentiated regions (Fig 3B) and were here studied in more detail.
As expected, introgressed haplotype blocks are unevenly distributed among the domestic populations with a clear geographic signal. For instance, in region chr2: 109,998,387-110,183,036, the frequency of the Iranian mouflon derived haplotype is high in Iran local breeds (0.60) and Tan sheep (0.61), but very low (0.00) in Australian Merino and several Chinese breeds such as Hu sheep, Ujimqin Sheep, and Valley Tibetan sheep. The longest introgressed urial haplotype chr9:77,117,407-77,437,296 has a high frequency in Tibetan sheep including Oula (0.75), Prairie (0.90) and Valley Tibetan (0.80) and is almost entirely absent in sheep from Africa, America, Oceania and the Middle East. Overall, these putatively introgressed regions contained 891 genes, of which 883 and 8 were within the haplotypes derived from Iranian mouflon and Urial, respectively. It is noteworthy that within subset of introgressed regions that we identify as highly differentiated regions, several genes have been associated with morphological traits, particularly in facial shape. RXFP2 was strongly associated with sheep horn morphology [26,27]

Introgressed RXFP2 affects horn status
There are three main types of horn status in sheep (1) horned males and females ("horned"); (2) horned males, polled females ("sex-specific"); (3) polled males and females ("polled") [41,42]. A previous study indicated that the "horned" haplotypes in Tibetan sheep within RXFP2 was most likely introgressed from argali [11]. However, in our study the same region (chr10: 29,435112-29,481,215) was detected as introgression from Iranian mouflon not argali (Fig 3B). Furthermore, we found that this introgressed region showed significant high F ST (P< 0.001) in breeds with different horn status, and arose long-stretched LD block (Figs 4A, S10, and S11). LAI indicates that most haplotypes in breeds with horn status (3) contain haplotypes most closely related to those of Iranian mouflon (Fig 4B-4D), pointing to a possible origin of this phenotype from this wild sheep species. We further investigated in detail the pattern of haplotype variation in this region.
Haplotype patterns in this region across all 1,167 sheep showed three major highly divergent haplogroups (hap-a, hap-b and hap-c), with a few other diverse or recombinant haplotypes at low frequency (Figs 4C and S12 Fig). Hap-a is the dominant haplotype in domestic  (Figs 4C and S12) and is completely fixed (frequency = 1) in Finnsheep (n = 12), Gotland (n = 10), Waggir (n = 9), Afshari (n = 6), and East Friesian sheep (n = 10) (S13 Fig and S4  Table), all of which have the "polled" phenotype. Intriguingly, hap-a is present as heterozygotes in two Iranian mouflon samples (Fig 4C). We speculated that polledness had likely occurred in wild sheep progenitors, possibly as recessive trait, and rapidly became widespread in domestic sheep because it was under strong selection in a domesticated setting.
Hap-b is generally found at high frequency in breeds with the "sex-specific" horn phenotype, including Chinese Merino (25/40, 0.625), Ouessant (12/20, 0.6) and Barki sheep (5/6, 0.83). Hap-c in contrast is usually at high frequency in breeds with the "horned" phenotype, including Oula, Prairie Tibetan, Valais Blacknose and Scottish Blackface sheep (S13 Fig). Fig 4D shows a network of intact non-recombined haplotypes in the~46-kb region around RXFP2 from wild and domestic sheep. The network suggests that haplogroups corresponding to Hap-a, Hap-b and Hap-c, respectively, are all linked to haplotypes that occur in Iranian mouflon. Two possible explanations remained either by introgression from Iranian mouflon or variation that has persisted since domestication. The former was more likely as the probability to be ILS for a 46,103 bp haplotype was extremely low (2.03×10 −5 ). In the network and in the ML tree (S14 Fig), Hap-a and Hap-b haplotypes of mouflon are intermingled with those of urial, so the introgressed fragments possibly originated from urial and were introgressed into domestic sheep via the mouflons. The Valais Blacknose and Scottish Blackface (European domestic breeds) haplotypes were assigned to the "horned" phenotype cluster, validating the earlier introgressed time for this locus as well. Nucleotide difference between the Iranian mouflon haplotypes and hap-c (Fig 2B) suggest that hap-c was introgressed from a mouflon subpopulation of Asiatic mouflon that has not yet been sampled or were the new mutations since the time of introgression. In spite of these facts, there was still possibility that the direct ancestor of domestic sheep had all three haplotypes (a, b and c), because it is really difficult to distinguish introgression from ILS. Analysis of non-silent mutations (S15 Fig) did not reveal a single causative mutation, but variant chr10: 29,439,011 has the highest correlation with the phenotype. A~1.8-kb insertion in the 3' UTR region of RXFP2 has also been identified to be a putatively causal mutation for horn status [43,44]. But it was not completely linked with horn phenotypes in all the domestic breeds or populations [43]. More efforts are needed to construct the relationship between these mutations and horn status, and to explore how these variations regulate horn status.

Ear morphology influenced by introgressed MSRB3
Another prominent introgressed region with high F ST contains MSRB3, encoding methionine sulfoxide reductase B3 (Figs 5A-5C and S16). Interestingly, ear morphology has been mapped to MSRB3 in sheep using breeds fixed for divergent ear types [39], designated as ear size (large-eared vs. small-eared) and ear erectness (drop-eared vs. prick-eared). This gene yielded significant f d values in 9 pairwise comparisons of argali vs. domestic population, encompassing chr3:154,000,001-154,090,000 (Fig 5A). This was confirmed by the absolute divergence d XY of argali and Oula, and of argali and Prairie Tibetan populations. (Fig 5B), which indicated introgression rather than shared ancestry (ILS) [24]. By contrast, the d XY of Iranian mouflon and populations for each 50-kb window. Gene annotations in the selected region in Oar_v4.0 are indicated at the bottom. (B) LAI within RXFP2 in Valais Blacknose, Scottish Blackface, Oula and Prairie Tibetan sheep illustrating mosaic patterns of source population. (C) The haplotype pattens of RXFP2 introgressed region (Oar_v4.0 chr10: 29,436,086-29,466,717). Each row is a phased haplotype, and each column is a polymorphic SNP variant. The refence and alternative alleles are indicated by light yellow and red, respectively. The haplotypes present in Iranian mouflons are indicated separately. Photo credits are showed in S7 Table. (D) A haplotype network generated by the R software package PEGAS based on 221 SNPs of 666 haplotypes.
https://doi.org/10.1371/journal.pgen.1010615.g004 A haplotype plot of the~32-kb MSRB3 region across 1,167 individuals (Figs 5D and S18) group into three main haplogroups, denoted as hap-I, hap-II and hap-III. These three haplogroups were corroborated by haplotype network and ML tree, in which domestic sheep haplotypes assigned to three clusters (S20 and S21 Figs). The hap-II cluster is close to argali haplotypes (S20 and S21 Figs), consistent with it being introgressed from argali. Furthermore, hap-II has the highest frequency in domestic sheep (1109/1831, 0.605), and is fixed in Finnsheep (n = 24), Hanzhong (n = 10), Tibetan Oula (n = 28), Feral (n = 6) and Old Spael sheep, and nearly fixed (�0.95) in Cameroon, Gotland, Ouessant sheep (S19 Fig and S5 Table). Due to its high frequency among sheep breeds, we speculated this haplogroup was likely to confer an adaptive advantage over the other two groups. Intriguingly, all the European mouflon (n = 3) in this study were likewise fixed for hap-II, suggesting that this introgression probably occurred before the first wave of migration of sheep into European breeds [20,45]. The frequency of hap-I is relatively low across all domestic sheep (155/1831, 0.085), but has a high frequency in Swiss White Alpine For a more controlled analysis of a link between MSRB3 variants and ear size, we used an F2 East-Friesian × Hu sheep hybrid population. We performed genome-wide association study (GWAS) of all the external ear traits, including measured width and length, in F2 hybrids (n = 323) (Figs 5E and S22). The analysis of ear width revealed a single significant association peak located in MSRB3 (Fig 5E and S6 Table), but there was no significant signal associated with ear length (S22 Fig) or the other ear traits. Crossbred individuals with different haplotype combinations (Fig 5F) or different genotypes of diagnostic SNPs (S23 and S24 Figs) displayed significant difference in ear width.

Complex patterns of introgressed regions within VPS13B
Another strong introgression signal was found in VPS13B (vacuolar protein sorting 13 homolog B), which showed the most significant introgressed signals from urial according to LAI (Figs 3B and S25), as well as several consecutive outlier windows in the top f d values (P<0.001) (Figs 3D and 6A). VPS13B is a large gene spanning about 800 kb and has a complex structure with 50 exons and 6 alternatively spliced transcripts. It encodes a large protein with more than 4000 amino acids. The LAI results showed that there were two urial introgressed regions located in VPS13B chr9:77,117,407-77,437,296 (319.8 kb) and chr9:77,511,156-77,666,735 (155.5 kb) (Fig 3B), comprising 4 and 3 major haplogroups respectively and covering about 59% of the gene (Figs 6E, 6F, S27 and S28). In addition, there is another separate introgression signal derived from mouflon in this gene chr9:76,946,737-77016,847, with three haplogroups (Figs 3B, 6D and S26). The haplogroups in these three regions form five major haplotype combinations, at least one of which is a recombinant (Fig 6D-6F), and the dominant haplotype in the first region is tightly linked to one of the five more downstream haplogroup combinations. The compound introgressed haplotype appears to have high frequency in Tibetan sheep (Oula: 0.9, Prairie Tibetan: 1; Valley Tibetan: 0.75), while at low frequency in domestic sheep from Iran (0.125), Turkey (0.045), America (0.063) and Australia (0). These signals were also supported by d XY and F ST values which were lower between introgressed haplotypes and urial than mouflon, despite the closer phylogenetic position of the later to domestic sheep (Figs 6B, 6C and S29). Whereas, this pattern was almost undetectable in partial introgressed region. We built haplotype networks of each region to investigate in detail the donor of introgressed haplotypes, but due to intermixed haplotypes we cannot distinguish whether the donor was urial or Iranian mouflon (Figs 6G and S30-S32). Similar to RXFP2, urial and Asiatic mouflon probably share the VPS13B haplotypes, which precludes an identification of the origin of the introgressions into domestic sheep.
VPS13B is functionally relevant to numerous phenotypes and diseases. It causes Cohen syndrome in humans with diverse manifestations including microcephaly, craniofacial and limb anomalies. In addition, variations in VPS13B affect face morphology, particularly nose morphology in human and mice [31]. Although the role of the introgressed fragments in VPS13B in sheep cannot at this stage be functionally verified, observations in other species suggest it may play a role in the development of facial shape.

Discussion
In the present study, we performed a detailed investigation of introgression in sheep and evaluated the amounts of sequence introgressed from each wild relative into domestic sheep. We have focused on the most consequential breed-specific variants by selection of fragments of >100 kb with a significant high F ST (Z test, P < 0.001). We also present an in-depth investigation of three regions containing the genes RXFP2, MSRB3 and VPS13B, which have been introgressed from wild sheep and now occur in a substantial proportion of the global sheep population. We show how these haplotypes are associated with variation in several morphological variations in domestic sheep.

Wild-domestic introgressions
Consistent with the previous studies, we detected the pronounced gene flow from wild relative to domestic populations, and the pattern that average proportions of wild relative sequence decrease with the phylogenetic distance between wild sheep species and domestic sheep [5,16]. Whereas, the location of introgressed fragments and the range of introgression percentages showed differences between this and previous studies [5,16], we speculated such differences were mainly caused by the distinct statistics or software that we used to identify introgressions. Besides, the focused breeds and their geographic distribution were not exactly the same.
The strong signal of early sympatric gene flow of the Iranian mouflon into ancestral domestic sheep is geographically plausible and explains the high proportion of mouflon-derived sequences in domestic sheep. Domestic sheep have acquired urial DNA segments either directly from sheep breeds in the eastern distribution range of the urial (Fig 1) or indirectly via the Iranian mouflon population [46]. Subsequent dispersal has brought domestic sheep into contact with argali. The introgression from snow sheep had also been proved previously [47]. A considerable genetic overlap of Asiatic mouflon and urial [47,48] indicated incomplete speciation and/or mutual introgression. This has resulted in an incomplete differentiation of these species and does not allow a clear differentiation of Asiatic mouflon and urial as source of introgression of RXFP2 and VPS13B.
Accurate identification of donor species may depend on the availability of whole genome sequencing (WGS) data from wild species candidates, and on method used to infer it. Hu et al. proposed argali introgression into RXFP2, but did not test the Asiatic mouflon [11]. Our data, especially the haplotype network (Fig 4D), clearly indicate that, although the argali haplotype does resemble the introgressed haplotype (hap-c), hap-c has much closer affinity with haplotypes in Iranian mouflon. Moreover, hap-c is actually geographically widespread among domestic sheep, being found in both Tibetan, European and African sheep breeds, which has not been shown before (Figs 4D and S12 and S4 Table). This supports that introgression of this haplotype predated the global dispersal from the sheep domestication rather than much later and localized to the Tibetan Plateau.

Long divergent haplotypes contribute to diversity of sheep
We found that introgressed wild haplotypes covered about 8% of the sheep genome, and therefore contributed substantially to the diversity of domestic sheep, on the level of either individual or breed-specific variation. As indicated by Fig 3B, we focused on a small proportion of all introgressed regions, but fragments that are shorter than 100 kb have a more random distribution across the breeds (low SD of within-breed frequencies) and do not appear to have high F ST between all 16 breed groups and Asiatic mouflon. Despite this, they may still contribute to the overall diversity of domestic sheep.
Breed-specific introgression may well be related to local adaptation through their link to sheep phenotypes, e.g. hypoxia responses and high-attitude adaptation [11,16,23], resistance to pneumonia [5] and reproduction [49]. It would be a reasonable expectation that traits resulting from human selection [50,51] were only indirectly influenced by wild introgression, such as the different wool [52] and tail types [53]. However, the absence of horns, a typical domestic feature, corresponding to RXFP2 haplotypes is also detected in "horned" Asiatic mouflon. A testable hypothesis is that RXFP2 of wild sheep is involved in balanced selection controlling the size of the horns.
A common feature of this study and comparable studies of cattle and goats is the observation of introgressed long (50 kb or longer) divergent haplotypes [4,13,[54][55][56]. Divergence of homologous sequences inhibits recombination [57][58][59][60] which explains the absence of intermediates of the diverged haplotypes and allows to retain the divergence of the haplotypes. Structural variants (SVs) residing in these long extended introgressed haplotypes have played an important role in local adaptation in human [61][62][63]. More analyses were necessary in sheep and other domestic animals to dig up the introgressed adaptive SVs.
In conclusion, using whole-genome sequencing data of large-scale individuals, we clarified the phylogenetic relationship among the eight extant species in the Ovis genus. In addition, we generated a global admixture graph of wild relative in diverse domestic sheep populations and determined whether positive selection had acted on these fragments. We also highlighted three introgressive regions in RXFP2, MSRB3 and VPS13B. Through detailed haplotype and functional analyses, we evaluated the role of long divergent haplotypes from wild relatives in shaping the morphological traits of domestic sheep, which may be a ubiquitous phenomenon in animal evolution.  (Fig 1 and S1 and S2 Tables). Following standard library preparation protocols, we used at least 0.5 μg of genomic DNA for each sample to construct paired-end library with insert sizes from 300 to 500 bp. Sequencing was performed on the Illumina HiSeq X Ten platform with a mean coverage of 13 Table).

Population structure and phylogenetic analysis
We analyzed the population structure using 293 representative samples (S3 Table), including all wild species and 56 domestic breeds. We used genome-wide 332,990 fourfold degenerate (4DV) sites to construct a maximum likelihood (ML) phylogenetic tree using RAxML v8.2.9 [73] with the following parameters: -f a -x 123 -p 23 -# 100 -k -m GTRGAMMA ('-f a' to run a search for the ML tree and a rapid bootstrap analysis in one run, '-x' a random number seed for the ML search, '-p' a random number seed for the parsimony inference, '-#' the number of bootstrap, '-m' substitution models). The robustness of specific tree topology was tested by 100 bootstraps. The final tree topology was visualized using INTERACTIVE TREE OF LIFE (iTOL) [74], and rooted at the branch of goat (Figs 2A and S1).
Principal component analysis (PCA) of whole-genome SNPs using was performed with the SMARTPCA program in the package of EIGENSOFT v.6.0beta [75]. To clarify the relationship between wild populations and domestic sheep, we performed four separate PCA using different dataset: (1) 293 samples, with the first two principal components cumulatively explaining 20.91% of the total variance. (2) 267 individuals including 3 European mouflon, 31 Asiatic mouflon and 233 domestic sheep; (3) 233 domestic individuals sampling from eight different regions, including Africa, Americans, Australian, China, South Asia, European, Iran and Turkey; (4) 117 individuals from 11 Chinese breeds (Fig 2B-2D).
We used ADMIXTURE v1.23 [76] to infer K = 2 to K = 9 clusters of related individuals to estimate the ancestry of each individual and quantify genome-wide admixture. For each K, we ran ADMIXTURE 20 times and calculated the mean cross-validation (CV) error to determine the optimal group number, the minimum CV value among 20 repetitions of each K was taken as the final result (Figs 2F and S3).

Selective sweep analysis
To detect potential selective signals, we calculated F ST  . F ST was calculated in 50-kb sliding windows with 20-kb step size (S11, S16 and S29 Figs) using vcftools v.0.1.13 [77]. In each comparison, the top 1% genomic regions with the highest scores overlapped were considered to be potential selective signatures. We performed Z test, and then focused putatively selected regions on the windows with a significance level of P < 0.001. To further verify whether this threshold was appropriate, permutations were performed in F ST analysis (S5 Fig). We first randomly selected 33 individuals from all the domestic sheep dataset and mixed these samples with 33 Iranian mouflon together, and then randomly divided them into two new populations. Next, we calculated windowed F ST values between these two new populations (in sliding 50-kb windows with 20-kb steps). The maximum value across all windows were recorded and this process was repeated 100 times. Finally, we sorted the 100 maximum values from largest to smallest and selected the largest value (0.155) as the final permutation result.

Whole-genome analysis of genomic introgression
Estimation of introgression on population scale. We implemented D-statistics with DSUITE [78] across all combinations of the 16 species/populations defines as described above. The species/population tree required for DSUITE was constructed using Treemix [79] without assuming gene flow (-m 0) using goat as outgroup (S4 Fig). Then, D and f4-ratios of all populations were calculated with the DTRIOS module, and the results for each chromosome were combined with Dtrioscombine module. After that, D and f statistics were calculated for each branch of the population tree using the FBRANCH module, and visualized the statistical results using dtools.py script provided in the DSUITE software (S4 Fig). Because there were few alleles sharing between domestic sheep from Europe and the other wild species and domestic populations (S4 Fig), domestic European samples were identified as non-introgressed reference population.
Identification and localization of genomic introgression. We used local ancestry inference (LAI) implemented in LOTER [80], which uses phased data and has been shown to outperform other tools for more ancient admixture. We specified seven wild relatives (1 snow sheep, 3 bighorn and 2 thinhorn, 12 argali, 6 urial, 31 Iranian mouflon, 3 European mouflon), and European domestic sheep as reference population, in which European domestic sheep (n = 30) was the control population for domestic component. It was assumed that a haplotype of an admixed domestic individual consists of a mosaic of existing haplotypes from the eight reference populations. For each fragment, LOTER derives the most likely ancestral origin on the basis of allele frequencies of reference populations and the selected populations. We calculated introgression percentages from each of the wild relatives into the haploid genomes ( Fig  3A) and merge overlapping introgressed regions from the same source. Then, the frequencies in the 16 domestic groups with their standard deviations (SD) and ranges (max-frequency minus min-frequency) were calculated for each selected fragment (Fig 3B). f d in sliding windows. We computed the modified f-statistic (f d ) value [24] using a 50-kb sliding window with 20-kb step size in the form of f d (EU_OA, domestic populations; wild species, goat), where EU_OA represents the European domestic sheep (n = 30) and domestic populations include 16 populations described above. We evaluated the statistical significance using two-tailed Z-test. We calculated the P values according to Z-transformed f d values, and the windows with P < 0.001 was defined as potential introgressed regions (Fig 3C-3E). Mean pairwise sequence divergence (d XY ) [24] was also calculated for 50-kb windows with 20-kb steps across whole genome using same populations above (Figs 5B, 6B and S17).
Incomplete lineage sorting (ILS). In order to exclude common ancestry as explanation for the presence of introgressed fragments, we calculated the expected length of ancestral sequence shared by domestic sheep and each wild relative, respectively. The expected shared ancestral sequence length (L) is calculated as L = 1/(r/t), in which r is the recombination rate per generation per base pair (bp), and t is the length between wild relatives and domestic sheep since divergence. The probability of a length of at least m is 1-GammaCDF (m, shape = 2, rate = 1/L), in which GammaCDF is the Gamma distribution function and the numbers within the parenthesis are its arguments [9]. We used a generation time of 4 years [81], a recombination rate of 1.0×10 −8 per base pair (bp) per generation [27] and the following divergence times: 0.032 Ma between Iranian mouflon and domestic sheep, 1.26 Ma for urial and domestic sheep, 2.36 Ma for argali and domestic sheep, and 3.12 Ma for bighorn (or thinhorn) and domestic sheep [17,23,[82][83]. This gives expected lengths of L (Iranian mouflon) = 6,192 bp, L (urial) = 159 bp, L (argali) = 85 bp, and L (bighorn/thinhorn) = 64 bp. We then removed inferred introgressed fragments shorter than L, and calculated the total length of remaining introgressed tracks. The length distributions are showed in S7 and S8 Figs. Probabilities of length of observed introgressed regions were calculated by the R function pgamma using the local recombination rates estimated in previous study [84]. The probabilities are 2.03×10 −5 for 46.10 kb (RXFP2 introgressed region) and zero for 31.70 kb (MSRB3 argali-introgressed region), 319.89 and 155.58 (VPS13B urial-introgressed regions), and 1.37×10 −5 for 70.11 kb (VPS13B mouflon-inrogressed) (S8 Table).
Haplotype patterns and network. To view the specific genotypes patterns of the prominent introgressed regions, including RXFP2, MSRB3 and VPS13B, we extracted the phased SNPs in these regions from 1,167 whole-genome sequencing individuals and visualized specific genotypes patterns in a heatmap (S12, S18 and S26-S28 Figs). We also constructed haplotype networks of RXFP2, MSRB3 and VPX13B using R package PEGAS [85] based on the pairwise differences (Figs 4D, 6G, S21, and S30-S32). We screened and eliminated samples whose haplotypes were interrupted due to recombination and removed SNPs with minor allele frequency �5%. In the 46.3 kb RXFP2 introgressed region we retained 333 samples from 6 wild species and 20 domestic sheep breeds and 221 SNPs. In the 23 kb MSRB3 introgressed region we retained 202 SNPs in 201 individuals. We analyzed the sheep ear shapes of the corresponding varieties of three haplotypes, and defined three main haplotypes as hap-I, hap-II and hap-III.
Genome-wide association study. From an East-Friesian sheep × Hu Sheep F2 generation bred by Gansu Yuansheng Agriculture and Animal Husbandry Technology Co.,Ltd. (Jinchang, Gansu) 323 samples were collected. Phenotypes include ear length and width, birth weight and age. DNA was collected from blood samples and whole genomes were sequenced by Shijiazhuang Boruidi Biotechnology Co., Ltd (Shijiazhuang, Hebei) using 40K liquid chip generated by genotyping by target sequencing [86]. Raw fastq files were filtered using fastp [87], reads were mapped to Oar_rambouillet_v1.0 and the variation were summarized in a VCF file. We used PLINK1.9 [88] to remove samples with > 10% missing genotypes and SNPs with minor allele frequency < 0.05 and >10% missing scores, retaining 317 sheep and 209,625 SNPs. To improve variant density, we used BEAGLE5.0 [89] to impute genotype using reference panel size of 43 East Friesen sheep and 8 Hu sheep with default settings and removed SNP with DR 2 (dosage R-squared) � 0.8, resulting in a total of 647,471 SNPs.
GWAS was conducted using GEMMA(0.98.3) [90] with the linear mixed model: y is the phenotype of n×1 vector; W the n×c matrix of covariates including fixed effects; α the c-vector of the corresponding coefficients including the intercept; x is the n-vector of markers; β the effect size of the markers; u the n-vector of the random effect with u~MVN n (0, λτ −1 K); MVN n the n-dimensional multivariate normal distribution; λ the ratio between the two variance components; τ −1 is the variance of the residual errors; K represents the known n×n relatedness matrix calculated by SNP markers; ε the random error n-vector with ε~MVN n (0, τ −1 I n ), where I n denotes n×n identity matrix. To decrease false positive signals, the genomewide significance threshold was set to be 7.72×10 −8 (0.05/647,471) after the Bonferroni correction.